Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform

نویسندگان

  • Toshio Irino
  • Roy D. Patterson
چکیده

We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal tract expands or contracts as the length of the vocal tract increases or decreases. There is a transform, the Mellin transform, that is immune to the e€ects of time dilation; it maps impulse responses that di€er in temporal scale onto a single distribution and encodes the size information separately as a scalar constant. In this paper we investigate the use of the Mellin transform for vowel normalisation. In the auditory system, sounds are initially subjected to a form of wavelet analysis in the cochlea and then, in each frequency channel, the repeating patterns produced by periodic sounds appear to be stabilised by a form of time-interval calculation. The result is like a two-dimensional array of interval histograms and it is referred to as an auditory image. In this paper, we show that there is a two-dimensional form of the Mellin transform that can convert the auditory images of vowel sounds from vocal tracts with di€erent sizes into an invariant Mellin image (MI) and, thereby, facilitate the extraction and separation of the size and shape information associated with a given vowel type. In signal processing terms, the MI of a sound is the Mellin transform of a stabilised wavelet transform of the sound. We suggest that the MI provides a good model of auditory vowel normalisation, and that this provides a good framework for auditory processing from cochlea to cortex. Ó 2000 Elsevier Science B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size

We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...

متن کامل

Extracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model

We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...

متن کامل

An Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation

An auditory Mellin transform has been proposed to segregate information about the size and shape of the vocal tract automatically; the process is also independent of glottal pitch. In this paper, we describe a method for resynthesizing speech from the Mellin representation using a high quality vocoder (STRAIGHT), and a nonlinear function to map between the two representations of speech. This en...

متن کامل

GENERAL SOLUTION OF ELASTICITY PROBLEMS IN TWO DIMENSIONAL POLAR COORDINATES USING MELLIN TRANSFORM

Abstract In this work, the Mellin transform method was used to obtain solutions for the stress field components in two dimensional (2D) elasticity problems in terms of plane polar coordinates. the Mellin transformation was applied to the biharmonic stress compatibility equation expressed in terms of the Airy stress potential function, and the boundary value problem transformed to an algebraic  ...

متن کامل

The perception of scale in vowels

The resonating properties of many objects provide acoustical correlates which can be used to gain information about the objects. The acoustic signal provides not only shape information (what the sound means) but also size information (how small/big the object is relative to the population). A signal processing algorithm able to isolate both shape and size information is the Mellin transform. It...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2002